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ABSTRACT 



The purpose of this report is to identify and synthesize 
existing practices used in developing collections of free third-party 
Internet resources that support higher education and research. A review of 
these practices and the projects they support confirms that developing 
collections of free Web resources is a process that requires its own set of 
practices, policies, and organizational models. Where possible, the report 
recommends those practices, policies, and models that have proved to be 
particularly effective in terms of sustainability, scalability, 
cost-effectiveness, and applicability to their stated purpose. The report 
outlines similarities and differences between print and free Web resources 
and describes how the nature and complexity of free Web resources comply with 
or challenge traditional library practices and services pertaining to analog 
collections. Several methods of data gathering were used, including 
interviews with librarians, Web browsers, and subject gateways. The findings 
that emerged from the research done for this report document that developing 
and managing collections of free Web resources has wide-ranging, long-term 
implications for human resources, organizational issues, and fiscal matters 
that extend well beyond the circle of individuals responsible for selecting 
these resources. In addition to the specific collection development policy 
issues of collection scope, selection criteria, and resource discovery, a 
discussion of the broader issues pertaining to managing collections of free 
Web resources is included. The report consists of 10 sections: (1) 

"Introduction"; (2) "Why Select Free Third-Party Web Sites?"; (3) 
"Identification, Evaluation, and Selection"; (4) "Access: Resource Discovery 
and Added-Value Functions"; (5) "Data Management: Collection Maintenance, 
Management, and Preservation"; (6) "Multilinguality"; (7) "User Support; (8) 
"Human Resources: Organizational and Financial Issues"; (9) "Future 
Directions: Nurturing Sustainability"; and (10) "References." (Contains 51 
references.) (AEF) 
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Preface 

In January 2000, the Digital Library Federation (DLF) launched an informal 
survey to identify the major challenges confronting research libraries that use 
information technologies to fulfill their curatorial, scholarly, and cultural mis- 
sions. With astonishing unanimity of opinion and clarity of voice, respon- 
dents pointed to digital collection development as their single greatest chal- 
lenge. Whether the digital information came from a commercial publisher or 
from a digitization unit within the library, it seemed to exist under a cloud of 
profound and unsettling uncertainty. Would it be useful and useable in its 
present or intended form, or require additional work on the part of catalog- 
ed, systems staff, or subject bibliographers? What new demands would its 
availability make on library reference staff? What level of continued invest- 
ment would be necessary to ensure its accessibility on current hardware and 
software? 

The survey also revealed that leading research libraries had learned a 
great deal about their digital collections through experience. Though substan- 
tial, that learning had rarely been expressed outside the collection policies, 
working papers, and implementation guidelines that libraries create to coor- 
dinate and manage their collection development efforts. Accordingly, in April 
2000, the DLF commissioned three reports to address broader concerns about 
digital collections. The three reports deal respectively with commercial elec- 
tronic content, digital materials created from library holdings, and Web-based 
"gateways'' that link to selected Internet resources in the public domain. The 
reports mark a starting point for what we hope will emerge as an evolving 
publication series. 

Working to a common outline and based on learned experience, the au- 
thors demonstrate how decisions taken by a library when acquiring (or creat- 
ing) electronic information influence how, at what cost, and by whom the in- 
formation will be used, maintained, and supported. By assembling and 
reviewing current practice, the reports aim where possible to document effec- 
tive practices. In most cases, they are able at least to articulate the strategic 
questions that libraries will want to address when planning their digital col- 
lections. 

In this report, Louis Pitschmann deals with the widespread practice of 
listing "useful" Internet resources. Variously billed as subject gateways, Inter- 
net resource guides, and "related Internet resources," these inventories ap- 
pear on library Web pages across the country. Their construction has become 
a cottage industry, often fuelled by the voluntary and devoted effort of indi- 
viduals who have taken it upon themselves to identify and catalog worthy 
resources that occupy some definable segment of the World Wide Web. The 
redundancy involved in this work is as substantial as its long-term hidden 
costs. The author's treatment of the practice is critical, yet fair. He has few 
doubts about the value that such Internet resource guides offer to library us- 
ers. At the same time, he asks searching questions about whether they may be 
developed and maintained outside the mainstream of collection development 
efforts and without the resources that typically support such efforts. Drawing 
extensively on evolving international experience, Mr. Pitschmann's work is 
essential for any institution that seeks to build its collection in part through 
reference to "free" Internet resources. 
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1. INTRODUCTION 



T he purpose of this report is to identify and synthesize existing 
practices used in developing collections of free third-party 
Internet resources that support higher education and research. 
A review of these practices and the projects they support confirms 
that developing collections of free Web resources is a process that re- 
quires its own set of practices, policies, and organizational models. 
Where possible, the report recommends those practices, policies, and 
models that have proved to be particularly effective in terms of sus- 
tainability, scalability, cost-effectiveness, and applicability to their 
stated purpose. 

Background work for this report confirmed John Kirriemuir's 
observation that the selection of free third-party Web-based resourc- 
es in North American academic libraries remains largely the respon- 
sibility of individuals working alone (Kirriemuir 1999). This back- 
ground work also revealed that tasks ranging from identification of 
resources to their delivery through a library OPAC, Web page, or 
portal are seldom documented in a manner that would permit others 
to build upon existing practices, regardless of whether those practic- 
es demonstrate poor choices or good practices. This fact obtains 
whether one is looking for criteria used by an individual librarian or 
adapted by a library as institutional policy. For example, when inter- 
viewed about his Web selection criteria, one area studies librarian 
responded that he selected only "quality sites." When asked to de- 
fine "quality," he replied, "We all know how to select good books, 
and Web resources are no different." Third-party public domain 
Web resources, however, are fundamentally different from scholarly 
print and analog formats as well as from commercially produced 
digital resources. 

This report outlines the similarities and differences between 
print and free Web resources and describes how the nature and com- 
plexity of free Web resources comply with or challenge traditional 
library practices and services pertaining to analog collections. The 
report also recommends, where appropriate, certain practices that 
have proved effective in meeting desired goals in specific contexts. 
There is no single set of preferred or "best" practices that one should 
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follow in collecting free Web resources. What succeeds in one context 
might prove fruitless in another. The practices recommended in this 
report have proved effective in a specific project or show promise for 
broad applicability, regardless of the fiscal, technical, or human re- 
source parameters in any specific institutional setting. Ultimately, it 
is not the practice per se, but rather how well that practice lends it- 
self to a particular set of goals, workflows, and staffing preferences, 
that determines its effectiveness and value in any given setting. 

1.1. Methodology 

Because of the paucity of formal documentation, writing this paper 
required the use of several methods of data gathering. 

1. Interviews. Librarians were asked to identify sites they considered 
to be "high quality" and to describe their criteria for evaluating 
the quality of a site. Every attempt was made to avoid implying or 
providing a definition of "high quality" when eliciting the librari- 
ans' opinions. This methodology proved useful for several rea- 
sons. First, it demonstrated the degree of subjectivity in defining 
this term. It made it possible to identify a large number of sites in 
the humanities and social sciences, as well as in science, technolo- 
gy, and medicine (STM), that document their developers' selection 
criteria. Interviewees who are directly involved in "collecting" 
free Web resources were asked to describe their experiences hiring 
and training staff, establishing and applying selection and catalog- 
ing criteria, identifying resources, and developing end-user inter- 
faces. 

2. Web Browsers. Using basic and advanced search strategies, the 
phrase "selection criteria" was used to locate Web sites where 
these terms were used and where criteria for selecting Web-based 
resources were discussed. This methodology was the least valu- 
able of those used in researching this report, but it did confirm 
that little information on this topic is easily retrievable through 
browsers. Web browsers also permitted the identification of a 
number of highly developed subject gateways, the policies and 
practices of which proved very relevant to this report. 

3. Gateways. More than 50 subject gateways, ranging from small to 
large, were reviewed in an effort to locate and evaluate their selec- 
tion criteria, staffing patterns, and cataloging and related practic- 
es. Reviewing these sites and communicating with their staff was 
the most useful of all the forms of data gathering used for this re- 
port, and it yielded by far the most valuable findings. Creators of 
European and Australian sites have refined and articulated selec- 
tion and related criteria to a far greater degree than have their 
North American colleagues. The selection criteria and document- 
ed practices of gateways such as the Internet Scout Report, how- 
ever, approach or equal the high standards set by gateway cre- 
ators in the Nordic countries, the Netherlands, Germany, the 
United Kingdom, and Australia. 
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4. English Language. Because this report was written for professionals 
involved in developing digital collections in North American li- 
braries, only sites that provide documentation in English were 
evaluated. Owing to the extent to which the English language 
dominates the Web, limiting the report to English-language docu- 
ments did not restrict the scope of the investigations upon which 
it is based. On the contrary, many of the most promising practices 
discussed in this report are for gateway projects in non-English- 
speaking countries and in which the documentation and user in- 
terface are provided in English and at least one other language. 



1.2. Content 

This report focuses on issues pertaining to the development of sus- 
tainable collections of free Web resources. The findings that emerged 
from the research done for this report document that developing and 
managing collections of free Web resources have wide-ranging, long- 
term implications for human resources, organizational issues, and 
fiscal matters that extend well beyond the circle of individuals re- 
sponsible for selecting these resources. What one selects and how 
one wishes to deliver it to the end user frequently have significant 
implications for workloads and priorities in traditional cataloging 
and metadata departments, technical support units, and user-servic- 
es programs. Free Web resources are challenging how libraries have 
traditionally processed and added value to print and analog publica- 
tions. Many of the practices and policies developed by the projects 
reviewed for this paper shed light on how to address and manage 
the myriad access, staff development, user training, and cost issues 
associated with collecting free Web resources. Thus, in addition to 
the specific collection-development policy issues of collection scope, 
selection criteria, and resource discovery, a discussion of the broader 
issues pertaining to managing collections of free Web resources is 
included. Among these issues are the following: 

a. Added Value or Cataloging. What levels of description, ranging from 
free-text notes to MARC records with traditional subject headings 
and metadata, are necessary or adequate to provide efficient ac- 
cess to free Web resources? 

b. Access and Archiving. Options for integrating free Web resources 
into collections and services must be explored. For example, do 
OPACs provide sufficient and appropriate access? Are separate 
files required, or are new Web pages with a series of links ade- 
quate to ensure users will find these resources once they are se- 
lected? Once the resources are identified, how can extended avail- 
ability be assured, given the ephemeral nature of Web-based 
content? 

c. User Support. How do users learn of free Web resources after they 
have been selected? What are the end users' support needs, and 
what are the implications of these needs for staffing and services? 
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d. Human Resources. What expertise is needed to select wisely and 
efficiently? The qualifications for staff who select Web resources 
are not as clearly defined as they are for staff who select print- 
based and analog resources for research library collections. How 
does one incorporate selection of free Web sites, a new job respon- 
sibility, into existing workloads? What skills and expertise best 
qualify an individual to be involved in the selection, cataloging, 
and technical support processes? 

e. Workflow and Organizational Issues. Making free Web resources 
available to users directly has an effect on multiple work units and 
workflows. Decisions pertaining to Web resource selection, pro- 
cessing, and delivery have specific implications for preexisting 
workflows and priorities. In building sustainable collections of 
free Web resources, planners must recognize that there are connec- 
tions between decisions that relate to Web resources and decisions 
that relate to analog collections. 

f. Fiscal Implications. Free Web resources are not unlike gift collec- 
tions: their acquisition has direct budgetary and workload impli- 
cations. Identifying, selecting, and making available free Web re- 
sources incur costs to the library. These costs are not trivial, and 
they can ultimately influence both the selection and access deliv- 
ery processes. Identifying the implications of free Web content and 
at what cost it will be managed and made accessible to end users 
influences planning, outcomes, and costs throughout the library. 



1.3. Terminology 

The research performed for this report revealed that not all individu- 
als recognized for their Web resource expertise use technical terms or 
jargon uniformly. In some cases, variant terms were used synony- 
mously; in others, experts used the same terminology quite different- 
ly, or at least with a variant nuance. To prevent ambiguity and to en- 
sure consistent usage, it was deemed appropriate to define several 
terms. In reviewing these definitions, readers should bear in mind 
that much of the literature in which these terms are used most con- 
sistently has been prepared in the European Union, where British 
usage is more common. The definitions presented here, however, do 
not vary significantly from North American usage. 

Free. Describes Web-based resources for which no compensation 
is required by the creator or host site in order to have full access to 
the content. 

Collection. Two or more related items, the relationship of which is 
determined by the scope of the collection, as defined by the staff re- 
sponsible for building that collection. 

Gateway (i.e., subject or information gateway and synonymous 
with subject guide, subject page, megasite, virtual library, clearing- 
house, and, increasingly, portal). The Internet Detective defines a 
"subject gateway" as differing from a search engine in that the con- 
tent of a gateway has been selected through some form of human 
input, normally a critical evaluation by an information professional 
or subject expert. IMesh defines a subject gateway as "a web site that 
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provides searchable and browsable access to online resources fo- 
cused around a specific subject. Subject gateway resource descrip- 
tions are usually created manually. For this reason, they are usually 
superior to those available from a conventional web search engine" 
(IMesh Toolkit 2000). 

Andy Powell provides a detailed set of definitions for the Re- 
source Development Network (Powell 2000). 



2. WHY SELECT FREE THIRD-PARTY WEB SITES? 



Today researchers in all disciplines are creating scholarly Web-based 
resources, and students and faculty regularly browse the Internet in 
search of sites related to their interests. Some create profiles describ- 
ing their needs so that search engines or harvesters relying on user- 
friendly language-retrieval systems can locate sites matching a given 
profile and notify the user. What is the role for librarians, other sub- 
ject experts, and information specialists in developing relatively la- 
bor-intensive and costly handcrafted collections (i.e., subject guides, 
catalogs, gateways) of otherwise freely accessible Web sites? 

Many free Web-based resources are merely collections of links to 
other sites or combinations of links and scanned files of texts, imag- 
es, or sound and video. A growing number contain information not 
readily available elsewhere. Once found, many of these sites are easi- 
ly viewed and downloaded. More sophisticated sites may require 
specialized or domain-specific tools, technologies, or guidance to 
view or use certain types of content (e.g., geospatial, scientific, or sta- 
tistical information). These sites are difficult or impossible to find, 
even by scholars and information specialists. Search results can be so 
overwhelming that the user cannot be expected to evaluate them in a 
reasonable length of time. Moreover, organization within the Web 
presents most scholarly users with a technological labyrinth. Results 
may be far removed from, or totally unrelated to, the desired find- 
ings. Finally, the artificial intelligence technologies employed by the 
major Web discovery tools are insufficient to retrieve and adequately 
evaluate scholarly content. 

Many of the inadequacies of the Web are due to the fact that the 
Web is still best described as "quantity without quality." A more ac- 
curate statement would be that the Web reflects the full spectrum of 
quality, ranging from minimal or null to highly authoritative. Search- 
ing the Web on any topic will retrieve all information pertinent to 
one's query, but there will be no qualitative evaluation or filtering of 
the content. Even the relevance rankings offered by some browsers 
do not reveal the quality of the information retrieved; they reveal 
only the frequency of terms stated in the query. Thus, a recent 
google.com search for "Mozart" retrieved more than 800,000 sites. 
They offered information not only about the composer and his works 
but also about composers influenced by Mozart, opera houses and 
companies, music processors, programming software, greeting cards, 
museums, the Salzburg airport, Austrian bonbons, and even person- 
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al pages in which the name Mozart appeared. Refining the search to 
"Wolfgang Amadeus Mozart" reduced the number of hits by a factor 
of 10, to a still-unwieldy 78,000+ sites. "Mozart biography" retrieved 
322, and then "about 392" sites; "biography of Mozart" yielded 
"about 437" sites. 

In cases such as these, relevancy ranking can significantly facili- 
tate resource discovery. Search engines that display results according 
to relevance do so on the basis of the following factors: 

• search term frequency in the document 

• inverse document frequency of the search term (i.e., the lower the 
frequency of the term in the entire database, the higher the rele- 
vance) 

• length (density) of the document 

• date of the document 

• occurrence of the term in the document title or only within the 
document 

• proximity of two or more terms (the greater the distance between 
terms, the lower the relevance) 

• link popularity (i.e., the frequency with which other Web resourc- 
es link to the document); similar to citation frequency measures 
used to evaluate the importance of a print resource or its author 

Unfortunately, relevancy rankings do not ensure precise retriev- 
al. They do not provide the higher education community with an 
evaluation of the intellectual level of content, nor do they provide the 
levels of organization and selectivity students and faculty require to 
take full advantage of free Web-based information. Only when sites 
have been reviewed, evaluated, selected, and cataloged will users be 
spared the ambiguities resulting from the randomness and "quantity 
without quality" of Web search results. Until technological advances 
permit rapid and highly precise retrieval of only that free Web-based 
content that is both authoritative and useful in the context of higher 
education and research, expert human intervention will be needed. 

And therein lies the role of librarians and other subject special- 
ists. The value-added services that libraries have traditionally pro- 
vided for print formats need to be applied to free Web-based resourc- 
es as well. The selection and cataloging functions of a physical 
library assure users that titles found there have met predetermined 
quality criteria and have been described in a manner that facilitates 
their identification and retrieval. Moreover, the cataloging process 
provides the authoritative and consistent grouping of related materi- 
als so vital to browsing and to the winnowing-and-sifting process 
that characterizes learning and research in an academic setting. As 
Sarah Thomas states, "Libraries can add value by promoting filtering 
and ranking which would prefer [Web-based] resources . . . that meet 
a set of established criteria, such as having a strong likelihood of au- 
thenticity, accuracy, or endorsement by others of standing" (Thomas 
2000 ). 
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Another traditional library service that is absent in the Web envi- 
ronment is storage or archiving. Once print resources have been se- 
lected and integrated into a research collection, most of them will be 
retained indefinitely, or at least until their physical condition requires 
withdrawal or their content has been superseded to such a degree as 
to warrant replacement. This is not the case with Web-based infor- 
mation. Web content is not durable; there are no archival guarantees. 
Information available at any given moment can move or cease to ex- 
ist without warning. Were libraries to select and archive Web re- 
sources in much the same way as they have collected print formats, 
users could be assured indefinite (i.e., archival) retrieval and delivery 
of what is otherwise an ephemeral format. How best and at what 
cost free sites should be archived and by whom are questions cur- 
rently under discussion. (See Section 5: Data Management: Collec- 
tion Maintenance, Management, and Preservation for further discus- 
sion of issues pertaining to archiving Web-based information.) 

Librarians and other subject specialists have myriad opportuni- 
ties to select and organize Web-based information. The current reali- 
ties of the Web validate the need for human intervention in facilitat- 
ing access to Web-based resources within the higher education 
context. By selecting a subset of resources that meet predetermined 
criteria and by facilitating access to them, librarians impose a quality 
structure on those resources. Users can be assured that sites found in 
such collections — selected, evaluated, and described by persons with 
recognized subject expertise, and made available through a catalog 
that provides a single interface to an integrated collection — are of 
related quality. Users can also be assured that access to these sites 
will be stable, and that resource discovery will be tailored to the 
characteristics of the collection and its content, rather than to the fea- 
tures of a specific search engine. 

For a description of the role of librarians and subject experts as 
performed in an applied setting, see Lagoze and Fielding 1998. 



3. IDENTIFICATION, EVALUATION, AND SELECTION 



3.1. The Need for Web-Specific Selection Criteria 

During the second half of the twentieth century, the library literature 
devoted much space to collection development and management 
theory and practice as they pertain to print and analog formats. Li- 
brary school curricula and professional development workshops 
gave considerable attention to these topics as well as to selection 
practices and polices. Learning the component parts of collection 
policy statements and drafting sample statements were basic require- 
ments in library school programs, and for many years they provided 
the focus for professional development and continuing education 
forums. In addition, library administrators have traditionally expect- 
ed selectors to apply these theories and principles in formal policy 
statements. Although written policy statements frequently lie dor- 
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mant for extended periods, they steer day-to-day selection decisions 
and determine the nature and quality of collections over time. For- 
mally articulated collection-management policy statements are cru- 
cial to collection building in that they document the intended scope 
of a collection (i.e., what it excludes as well as what it includes) and 
explain historical strengths and weaknesses within a defined context. 

This long-established practice of documenting collection policies 
for print and analog formats notwithstanding, there was at the close 
of the 1990s a paucity of documentation describing the practices fol- 
lowed in North American academic libraries when selecting free 
Web resources. Many subject specialists developing guides to free 
Web resources do not recognize the need to articulate their collection 
policies, as even a cursory survey of the Web will reveal. Online poli- 
cy statements are rare; written documentation describing the goals 
and objectives in creating and maintaining these guides are rarer still. 

This absence of formal, well-articulated, and commonly accepted 
criteria or guidelines for selecting free Web resources is due in part to 
the way in which such resources have thus far been acquired and 
added to academic library collections. By and large, administrators 
and subject specialists alike have perceived the selection of digital 
resources to be an extension of the print selection process. Staff 
charged with selecting print resources are the primary selectors for 
all or most digital formats, whether or not those resources are free. 
Blurring distinctions further is that information resources produced 
by and for academic and research communities have many similari- 
ties, whether these resources are print-based, analog, or digital. 
Scholarly resources can be textual, image-based, numeric, or geospa- 
tial. Their creators can be personal, corporate, governmental, non- 
governmental, or even anonymous. Their content can be fact or fic- 
tion, original works of art or music, interpretive performances, or 
critical studies. Despite these differences, their single intended audi- 
ence remains the academic community. Given this scenario, the need 
to articulate new or additional selection criteria for free Web-based 
resources is not immediately apparent. Indeed, many of the evalua- 
tive criteria that apply to print-based publications also pertain, to 
varying degrees, in the Web environment. The most obvious of these 
criteria are content quality and programmatic need. 

As appropriate as these and other traditional selection criteria 
for print and analog formats may be, they quickly prove insufficient 
when selecting Web-based resources. Many print-based criteria lend 
themselves only partially to the evaluation of Web-delivered content; 
other criteria are not applicable at all. Web resources exhibit qualities 
that do not exist in print and analog formats and for this reason re- 
quire additional evaluative measures. To evaluate their content, for- 
mat, and dependence on technology and to describe their value and 
applicability to higher education and research, new criteria are needed. 

Free third-party Web resources are particularly challenging. 

Their availability does not bear the imprimatur or nihil obstat that the 
traditional scholarly vetting process confers on the printed word. 
Frequently, the authority of those who have created these resources 
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is not immediately discernible, and the veracity of their content can- 
not be surmised from the names and reputations of scholarly pub- 
lishing houses. The concept and role of publisher in the case of free 
resources are oftentimes vague. Is the publisher the site hosting the 
file where the content is found? Or is it the person or body that de- 
veloped the site? Further, most of the traditional services associated 
with marketing and distributing print resources are lacking. Vendors 
do not facilitate the acquisition of Web resources through any type of 
notification service, blanket order, or approval plan. National bibli- 
ographies do not record their existence. Rarely if ever do unsolicited 
mailings or the traditional print or electronic blurb announce their 
"publication." Only a small number, far fewer than one percent by 
anyone's measure, receive a formal review. Simply stated, the basic 
selection guidelines and principles of how to identify, evaluate, and 
acquire print-based and analog library materials — principles that are 
articulated so clearly in the collection-development canons — do not 
pertain in the environment of the Web. The successful inclusion of 
Web resources in academic library collections requires new or, at a 
minimum, additional selection criteria. 

Fortunately, the number of clearly articulated, high-quality selec- 
tion criteria for free Web resources is growing. The American Library 
Association (ALA) has promulgated basic guidelines for selecting 
Web sites suitable for public libraries, reference services, and the 
fields of education and business (2000, 1999, 1997). Some individuals 
have published well-written articles on general principles pertaining 
to selection of free Web resources (McGeachin 1998, Fedunok 2000, 
Sweetland 2000). A valuable collection of articles on major projects to 
harvest and organize Web-based content is The Amazing Internet 
Challenge: How Leading Projects Use Library Skills to Organize the Web 
(Wells, Calcari, and Koplow 1999). The extent to which staff have ap- 
plied or adapted these guidelines in other projects remains largely 
undocumented. 

Nevertheless, Web selection criteria are only rarely found in pro- 
fessional journals and other print formats. Instead, they are custom- 
arily found as Web-based documents, imbedded in scope statements 
and collection policies of the increasing number of subject gateways 
and in the background papers on resource discovery systems. For 
example, well-delineated and highly articulated criteria can be found 
at Web sites developed by the Internet Scout Report, the Research 
Discovery Network in Great Britain, sites in the Netherlands (e.g., 
DutchESS), and at various Scandinavian sites, such as those devel- 
oped at the Kuopio University Library in Finland, all of which are 
available in English. The Internet Detective and the DESIRE Fland- 
book are also valuable. 

What follows is both a synthesis and an amalgamation of criteria 
and practices for evaluating free Web sites described at these and re- 
lated sites in North America, Europe, and Australia. These criteria 
cover the full range of issues associated with evaluating free sites: 
provenance, authorship, content, design, user support, standards 
and technologies, navigation, and system integrity. The criteria have 
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been selected for inclusion in this paper because of their universality. 
They apply equally to scientific, technical, medical, social science, 
and humanities Internet resources. Their application in existing 
projects has led to the creation of consistent and coherent collections 
of "high-quality" content. The DESIRE Information Gateways Hand- 
book uses service-driven definitions to describe "high quality." Thus, 
"quality" depends on users' needs, and a "high-quality" site is one 
that meets their needs because it is relevant for their purposes. Fur- 
ther, quality is determined on the basis of inherent features of a re- 
source. Thus, high quality can be ascertained only if a skilled and 
knowledgeable human evaluates both user needs and the inherent 
features of the Web resource. Building on this premise, Kuopio Uni- 
versity Library staff expressly state that "Information content should 
comply with the needs of the frame organization" (Kuopio Universi- 
ty Library Group 1996). That is to say, the resources selected should 
support the teaching, research, and information services of the li- 
brary selecting them for their collection. 

3.2. Developing Selection Policies 

The need to define collection parameters and to articulate content 
criteria is equally important in the print and digital contexts. Devel- 
opment and maintenance of consistent and coherent collections of 
high-quality print or digital resources can succeed only in an envi- 
ronment in which value judgments are made on the basis of previ- 
ously defined and agreed-upon collection policies and selection crite- 
ria. Clearly articulated policies allow staff, over time, to select 
content that is consistent with their institution's mission and long- 
term goals. Formal policy statements afford users an opportunity to 
understand and evaluate the rationale underlying a collection and to 
determine its ultimate value in relation to their needs and expecta- 
tions. How users conceptualize the nature and features of a collec- 
tion significantly influences how they perceive that collection will 
serve their needs. 

The DESIRE Information Gateways Handbook notes the following 
advantages of developing formal policies specific to free Web re- 
sources and making them available online to both staff and users. 
Such policies: 

1. help users appreciate that the service is selective and quality con- 
trolled; 

2. help users understand the level of quality of information they will 
find when using the service; 

3. help staff be consistent in their selection and to maintain the qual- 
ity of the collection; 

4. help train new staff; and 

5. ensure consistency in collections that are developed by a distribut- 
ed team. 
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According to this list, formal policies are equally valuable for 
staff and users. Similarly, quality and consistency receive equal em- 
phasis; this underscores the need for a formal policy that staff mem- 
bers apply and users understand. Only when selection practices are 
broadly understood and consistently applied can quality be defined 
and identified. Through the consistent application of accepted prac- 
tices, the integrity of the collection (i.e., the value of the individual 
pieces and of the collection as a whole) will be assured and consis- 
tent. 

As these five points demonstrate, the fundamental principles 
and components found in policy statements pertaining to collections 
of print and analog formats apply to a certain extent to free Web re- 
sources. Understanding and acknowledging this fact will facilitate 
developing and articulating policies suited for free Web resources. 
This understanding will also assist staff in recognizing that Web re- 
sources are a continuum in the history of scholarly communication, 
and that their intellectual content is not inherently different from that 
of other formats. Nonetheless, while much of the rationale for devel- 
oping collection policies in the print and analog environment applies 
to developing collections of free Web resources, evaluating the full 
essence and nature of free Web resources requires additional evalua- 
tive criteria — criteria that evaluate the unique aspects of information 
contained in and delivered through free sites. 

3.3. Collection Scope Statement 

A well-developed collection policy includes a scope statement as well 
as selection criteria. The scope statement is the first filter in the re- 
source evaluation process. It defines the parameters of the collection 
in broad terms by describing what is included in and excluded from 
the collection. The justifications for these inclusions and exclusions 
should be clearly stated and based on the needs of the intended us- 
ers. At a minimum, the scope statement outlines the subjects includ- 
ed in the collection. It clarifies the acceptable sources for information 
(e.g., academic, government, commercial, nonprofit, personal re- 
sources) and the acceptable level(s) of difficulty (i.e., suitable for use 
in higher education settings, scholarly, kindergarten through twelfth 
grade, or popular) that will be considered. The scope statement also 
describes the resource types presumed to be relevant to the subject 
and the primary audience. The following list of resources included in 
the Humbul Humanities Hub can serve as a guide to defining collec- 
tion scope: 

• primary/original sources in electronic format (i.e., information 
mounted on the same server as the site and created or produced 
by the owners of the site) 

• secondary sources, whether published solely on the Web or as sur- 
rogates for printed editions (i.e., "third-party information," locat- 
ed and created at another site and made available through a link 
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that takes the user to the other site; secondary information may be 
primary information at its host site) 

• research projects and reports 

• bibliographies and bibliographic databases 

• electronic journals 

• e-mail lists where online archives exist 

• academic department Web pages 

• professional association Web pages 

• resource directories 

If resources that one might expect to find are excluded on the ba- 
sis of access restrictions (e.g., technological barriers, cost, or registra- 
tion) or of content, the scope statement should list these exclusions 
and give the rationale for not providing access to them. 

Other aspects of scope concern language and geographic param- 
eters. Although often overlooked because of the predominance of 
English-language resources on the Web, the increasing number of 
bilingual and multilingual high-quality sites calls for greater consid- 
eration and clarification of the language parameters imposed on a 
collection. The growing number of scientific sites, especially mathe- 
matics sites, where numbers, signs, symbols, and formulas are not 
language-specific demonstrates the importance of evaluating narrow 
or highly restrictive meta-language parameters before they are put in 
place. Similarly, geographic parameters need to be carefully re- 
viewed and clearly stated. The Web is one of the primary forces re- 
ducing, if not erasing, geographic barriers to communication and ac- 
cess to information. The decision to limit the scope of a collection to 
resources developed in a particular country or in a particular lan- 
guage may be appropriate, but the justifications for that decision 
need to be stated. 

RECOMMENDED EXAMPLES OF 
SCOPE STATEMENTS 

DutchESS. Dutch Electronic Subject Service. DutchESS Scope 
Policy. Available at http://www.kb.nl/dutchess/manual/ 
scope_eng.html. 

Humbul Humanities Hub. Available at http://www.humbul.ac.uk/ 
about/ colldev2.html. 

Internet Detective. Creating a Scope Policy for Your Service. Available 
at http://www.netskills.ac.uk/TonicNG/content/detective/56.html. 

Jennings, Simon. 2000. RDN Collections Development Framework. 
Version 1.1 (May). Available at http://www.rdn.ac.uk/publications/ 
policy.html. 

Library of Congress. BEOline: Selection Criteria for Resources to be 
Included on the BEOnline+ Project. Available at http://lcweb.loc.gov/ 
rr/business/beonline/ beonsel.html. 
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SOSIG Social Science Information Gateway. Training Materials and 
Support. Available at http://www.sosig.ac.uk/about_us/ 
user_support.html. 

3.4. Selection Criteria 

Although resources that do not fall within the parameters of a collec- 
tion scope statement should be rejected without further review, not 
all resources that do fall within the scope of a collection necessarily 
warrant inclusion. Further review and evaluation using established 
selection criteria assure that a collection will include only those ma- 
terials that best meet users' needs and expectations (i.e., those that 
meet the definition of "high quality"). Responsibility for setting 
these selection criteria varies from one gateway to another. Some 
gateways openly state that "no [selection] criteria were established" 
(e.g., ALA Machine- Assisted Reference Section [MARS] (1999)). Oth- 
ers provide brief selection criteria statements (e.g., Best of the Best 
Business Web Sites [BRASS]). An increasing number of specially 
funded projects are built on highly articulated selection policies (e.g., 
Dutch Electronic Subject Service [DutchESS]; Engineering Electronic 
Library [EEL], Sweden; and Internet Scout Report). 

The selection criteria described in the selection policies and col- 
lection statements reviewed for this paper can be categorized into 
four groups: context, content, form /interface, and technical criteria. 
Each of the four groups comprises multiple criteria. No one criterion 
will adequately evaluate a site, and frequently several criteria over- 
lap and intertwine — a process that blurs their definitions yet ulti- 
mately reveals more effectively the quality of the site under evalua- 
tion. A review of the collections using the following criteria did not 
reveal that any of them are more important than any other; more- 
over, no stated or implied priority order for applying these criteria 
can be found. Therefore, the order in which criteria and their catego- 
ries are presented here should not be construed to imply any hierar- 
chy or relative importance. Rather, they appear in an order that 
might facilitate the culling process. Their inclusion is based on their 
repeated occurrence among selection criteria used and described by 
various projects and programs committed to the development of 
high-quality collections of free Web resources. 

3.4.1. Context Criteria 

"Context" applies to the origin (provenance) of a site and its content, 
as well as to the suitability of a new resource to an existing collec- 
tion. How a site meshes with and enhances existing or anticipated 
content within a collection will determine the quality of the collec- 
tion over time. 

3.4.1.1. Provenance. The origin or source of a site reveals and confirms 
much about its value and provides important information about its 
overall quality. Parsing a site's URL often yields sufficient informa- 
tion to judge the affiliation of its creator and the reliability of its server. 
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3.4.1.2. Relationship to Other Resources. Like print resources, free Web 
resources can stand alone or in aggregate with other resources. As 
with any collection, however, the value of a collection of Web sites 
lies in the integrity of its individual components. The greater the de- 
gree to which each site (component) within a collection relates to and 
enhances others within the same collection, the greater its value of 
the collection as a whole. Evaluations of free Web sites have shown 
that many of them contain links to the same sites. This redundant 
content will increase the size, but not the quality, of the collection. 
Each site added to a collection must be viewed as an integral part of 
a larger mosaic. Redundant, superfluous, unrelated, or poorly suited 
pieces will not enhance the collection; they will only encumber it and 
ultimately discourage or confuse users. 

3.4.2. Content Criteria 

Content is arguably the most important criterion used when select- 
ing resources, and it must apply equally to all sites. The term applies 
to the information contained in a resource, and it is used both quali- 
tatively and quantitatively. A high-quality site with little content may 
or may not prove to be a useful site; however, the content of a low- 
quality site that has high quantity requires close scrutiny, for if little 
else exists on a specific subject, its mere availability may prove use- 
ful. Content should be evaluated on the basis of multiple factors and 
ultimately judged on the degree to which it supports the purpose of 
the collection to which it will be added. 

3 A. 2.1. Validity. Editors, reviewers, and publishers carefully vet the 
content of printed scholarly resources before they are published. This 
is not true in the case of free Web resources. Anyone with certain ba- 
sic skills and access to server space can make information accessible 
to anyone who happens upon it. Web -based information that is 
flawed, whether intentionally or inadvertently, is not always easily 
discernible from information that has been carefully and thoroughly 
verified. Especially when combined with high-quality Web features 
such as attractive graphics and slick navigational functions, misinfor- 
mation, false claims, and factual errors may not be self-evident. For 
example, an essay by I. Newton on the theory of gravity or by W. 
von Braun on rocketry may not contain the ideas of the great scien- 
tists these names imply, but rather the work of Ian Newton and Wes- 
ley von Braun, 12-year-olds hard at work on their middle school sci- 
ence projects. 

The validity of free Web resources should be measured on the 
basis of several factors. Is the source of the content clearly identified? 
Does the URL support the claims of the "author's" affiliation and 
credentials? Is contact information readily available, and, if so, what 
does it reveal about the person or team responsible for the content? 

Is the information available in print? If so, does the print version im- 
ply or state a critical review or validation of the information con- 
tained therein? To what extent has the Web resource been vetted by a 
third party; has it passed through any kind of "quality filter"? Is 
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there evidence of or description of the extent and nature of quality 
checks, review, or evaluation of the content? Is the source of the in- 
formation adequately described? Is the research process leading to 
the creation of the content described? Are references cited? Is a bibli- 
ography or webography provided? 

3. 4.2. 2. Accuracy. Whereas validity measures the degree of objective 
truth, accuracy is a measure of the degree of correctness in the de- 
tails. A resource may exhibit validity but lack a high decree of accu- 
racy. Not all books that have successfully passed through the vetting 
process because of the validity of their content are as highly re- 
viewed when evaluated for the accuracy of their supporting details. 
To some degree, the same is true of Web site content, which may ex- 
hibit inaccuracies caused by errors in data entry or by intentional bi- 
ases of the person or team who developed the content. Even errors in 
spelling and keying are important indicators of the accuracy of the 
content. A high error rate in one aspect of a site (e.g., keyboarding) is 
often a good indicator of other potential flaws. Many of the measures 
for judging validity also apply to evaluating accuracy. Judging accu- 
racy is difficult. It requires subject expertise and can prove to be 
highly subjective. 

3.4.2. 3. Authority. The authority of a site is dependent on the reputa- 
tion, expertise, and credentials of the person(s) responsible for creat- 
ing the site and providing access to it (in traditional terms, the au- 
thor and publisher). The experience and credentials of individuals 
who create free Web sites vary radically. Authors may be enthusiastic 
hobbyists or recognized authorities who have devoted years of 
study to the topic. Does the site provide biographical information or 
resumes about the author? Knowing whether the site content is at- 
tributable to Sir Isaac Newton or to the hypothetical Ian Newton re- 
ferred to earlier may provide an important measure of the site's au- 
thority. Similarly, the location of the server may prove valuable in 
evaluating authority. Servers maintained by colleges, universities, 
museums, scholarly societies, or professional associations imply high 
levels of expertise in the development of site content. URLs contain- 
ing a tilde (~), even if on a server maintained by such institutions or 
organizations, require closer evaluation, because a tilde normally sig- 
nifies a personal page that is merely maintained at that location. Oth- 
er indicators of authority are postal and e-mail addresses that can 
support authors' claims regarding their credentials and affiliations. 
The primary measures of authority are 

• the creator of the site 

• the creator's reputation 

• where the server is located 

• how many other sites link to it 

Sites that do not identify authorship require particularly close 
evaluation through the application of other content criteria. Ano- 
nymity is normally not associated with high-quality content. 
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3 A2A. Uniqueness. Uniqueness is a measure of the amount of prima- 
ry (i.e., original) information contained in a site. It also is a measure 
of the extent to which information at a site under review is duplicat- 
ed in sites that have already been selected for the collection. Some 
degree of content redundancy is acceptable; one common character- 
istic of free Web resources is that their content duplicates that of oth- 
er sites. The context and manner in which duplicate information is 
presented at other sites may argue in favor of selecting sites that con- 
tain information currently in a collection. Redundancy, however, 
must be evaluated carefully. The nature of digital information and its 
accessibility by multiple simultaneous users nullify all legitimate ar- 
guments for duplicate copies of information in print and analog col- 
lections. Users searching a specific collection will not be well served 
if their search yields multiple hits, all of which point to the same in- 
formation and accessible through multiple sites. Sites that contain 
primary information not available through other sites, depending on 
how the site measures up against other criteria, will undoubtedly 
prove to be of greater value than will sites containing secondary in- 
formation — unless the secondary information offers significant add- 
ed value. " About this site" links can prove highly beneficial in evalu- 
ating uniqueness. An evaluation of the URLs to which the site links 
will reveal whether they point to information at the site (primary in- 
formation) or to external sites that have probably been created by 
someone else. Links to external sites can be desirable, particularly if 
they offer significant added value. 

3A.2.5. Completeness. Completeness does not mean comprehensive 
coverage of information or exhaustive treatment of a topic. Rather, it 
refers to the availability of content at a particular site. The phrase 
"under construction" signals that the creators of the site are eager to 
inform others of their work but that the content or site design re- 
mains incomplete. However, the absence of the phrase "under con- 
struction," similar wording, or icons conveying that message does 
not assure that the content is indeed complete. A site is incomplete if 
it points to non-networked resources or to print resources for the full 
"edition" of the "work" or if links are grayed out. Some links may 
not be grayed out but point only to empty files. The content may fail 
to support the purpose of the site; this is a serious indication that the 
site is not complete. The extent to which the content of a site is incom- 
plete and the length of time it remains incomplete affect the quality of 
the site and the quality of any site or collection that links to it. 

3.42.6. Coverage. Coverage refers to the depth to which a subject is 
treated. The term is used both qualitatively and quantitatively. How 
adequately the site treats the topic and to what degree (i.e., depth) it 
treats it determine the quality of coverage. Is the topic sufficiently 
represented, or are the various aspects of the topic treated superfi- 
cially or not at all? Is there primary information? If little or none, 
how successfully does the site provide linkage to primary content 
maintained at other sites? What are the gaps in coverage? Even the 
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finest print collections have lacunae, but there is a limit to the extent 
and nature of gaps, in both print and digital collections, beyond 
which the limited coverage affects the quality of a specific item or the 
collection generally. Links that are present, but broken, adversely af- 
fect coverage. Indexes, contents pages, and bibliographies facilitate 
the evaluation of the coverage; however, sites that are limited only to 
links or that consist only of enumerative bibliographies may prove 
inadequate if the bibliographic information is not supplemented 
with abstracts or links to the full texts. On the other hand, if the site 
provides the only bibliography on the subject or provides more sub- 
stantive or current content than do other bibliographies, it may be an 
important site solely on the basis of its bibliographic coverage. At 
this time in the history of the Internet, users should not expect to 
find that coverage of a topic provided by a free Web site necessarily 
equals that of many print resources. Internet publishing, though ma- 
turing rapidly, is in its incunabulum phase. 

3A.2.7. Currency. Currency pertains to the degree to which the site is 
up-to-date; i.e., the extent to which it presents prevailing opinions, 
ideas, concepts, scientific findings, theories, and practices relating to 
the subject. Policy statements, for example, on the selection of "elec- 
tronic resources" written in the early 1990s and concerned primarily 
with the evaluation and acquisition of information on CD-ROMs 
would appropriately receive a low currency score today because the 
digital formats currently available were unforeseen at the time the 
site was created and criteria for their evaluation had not yet been ar- 
ticulated. Currency, therefore, refers to content that describes the cur- 
rent thinking and the context in which higher education finds itself 
currently. Just as a library collection of print materials must reflect 
the latest research, so too should high-quality free Web site content. 
(See also a discussion of Information Integrity in section 3.4.4. 1.) 

3.4.2. 8. Audience. Audience is what traditional print selection criteria 
describe as "intellectual level of content." That is, for whom is the 
information intended? Once the intended audience is discerned, the 
person selecting the site should ascertain how well the site meets the 
needs and interests of that group. A high-quality site on organ trans- 
plants that is created for middle-school students will not serve the 
needs of university students. The site itself could be outstanding, but 
if it fails to meet the needs of its intended users, it is of low quality. 

3 . 4 . 3 . Form/Use Feature (Accessibility) Criteria 
Form criteria are features that determine how the content is present- 
ed and how accessible it is. They include, for example, organization 
and user support. How readily a user can access content at a site, dis- 
play it on a monitor, and download or print content depends on site 
design, user aids, and software applications. Unlike print resources, 
Web sites are not uniform in structure. Their access and navigation 
are neither uniform nor based on centuries of tradition-bound orga- 
nizational concepts. Web resources lack the physical structure and 
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content sequence (e.g., title page, foreword, introduction, table of 
contents, contents, and indexes) that characterize how printed matter 
is organized and that facilitate access. 

Well-developed and organized digital content is far more than a 
series of pages to be read sequentially or accessed by page number 
or by specific indexing terms. Rather, it is multifunctional, allowing 
dynamic searching and downloading of content. Display, however, is 
a two-dimensional image that lacks the physical cues that permit or 
enhance use. Links and icons facilitate access to the content and per- 
mit the user to comprehend the purpose and content of a resource 
and use it effectively. Because these features may differ in quality, 
form (use features) must be as carefully evaluated as is content when 
selecting Web resources. 

3.4.3. 1. Composition and Site Organization. Because of the peer-review 
and editorial processes associated with print publications, an analy- 
sis of how information in a print format is organized, structured, and 
arranged requires only minimal evaluation during the selection pro- 
cess. Editors and reviewers contribute greatly to the final form of a 
printed work; spelling errors, bad grammar, poor style, and poor or- 
ganization of content are removed in the vetting, revision, and edito- 
rial processes. Web resource creation does not consistently include 
equally rigorous evaluation and revision as does information in print 
format; in fact, the most extensive evaluation of composition and or- 
ganization may occur after completion and then only by end-users. 
How well a site is composed and organized will determine how ac- 
cessible users find the content of any Web resource. Thus, the selec- 
tion process must include an evaluation of whether Web site content 
is logically or consistently organized or even divided into logical, 
manageable components that meet the needs of the intended users. 
Further, evaluators should take into consideration whether design 
enhances or hinders accessibility. A site may prove be "overly de- 
signed" if aesthetic features (e.g., wallpaper, graphics) require an un- 
necessary use of plug-ins or hinder access because they make images 
slower to load. Not all users have state-of-the-art computers or high- 
speed modem connections. Ease of access is relative and should 
weigh heavily in site evaluation (Wells, Calcari, and Koplow 1999, 
210). For a good discussion of issues pertaining to composition and 
site organization, see "Training Materials and Support," on the Social 
Science Information Gateway (SOSIG) Web site. 

3.4.3.2. Navigational Features. In addition to software applications 
(such as special viewers), layout, design, search functions, and user 
aids can facilitate or impede navigation of a site. Each of these fea- 
tures requires a separate evaluation, but it is their successful combi- 
nation that produces a high-quality site. One measure of navigation- 
al quality is browsability. How easily can the user browse the 
content? Are logically organized subsets of related information 
grouped so that manageable amounts of data or information can be 
browsed? Browsability is directly related to the value of a search 
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function within a resource. Does one exist, and is it adequately de- 
signed and described? The availability of keyword search functions 
combined with Boolean operators should be expected. 

The value of search results, however, depends on the quality of 
indexing. Whereas search functions pertain to how one searches; in- 
dexing pertains to choice of terms and to the depth to which terms 
are linked to content. Indexing and searching should, therefore, be 
evaluated separately. Superior search functions provide little added 
value if poor quality indexing standards or practices have been ap- 
plied. Once a search has located information within the resource, 
how conveniently can the user print or download relevant content? 
For example, can a user print or download a single file of documents 
that is composed of a series of separate pages? Some site criteria sug- 
gest that substantive content should be no more than three clicks 
away, and that all links should be unambiguous and make it clear to 
the user where links are leading (see SOSIG). Some navigational fea- 
tures (e.g., tables of contents and indexes linked to content and but- 
tons or icons directing users "home," "forward," and "back") are 
quite simple. 

3. 4.3. 3. Recognized Standards and Appropriate Technologies. Closely re- 
lated to composition and navigational features are the standards and 
technologies used in developing a site, for they determine the degree 
to which a user may access and use Web-based content. Today, acces- 
sibility weighs heavily in decisions pertaining to both technology 
and design, and closer attention is paid to developing and adhering 
to minimum design standards that will assure access to the largest 
number of users possible. The U.S. Department of Education Office 
of Civil Rights has defined these standards as those that ensure 
"timeliness of delivery, accuracy of the translation, and provision in 
a manner and medium appropriate to the significance of the message 
and the abilities of the individual with the disability" (Waddell 1998). 
Currently, the implementation of the Americans with Disabilities Act 
is receiving close review in higher education circles, for accessibility 
to the Web obtains whether one wishes to assure access by individu- 
als with disabilities, distance learners, or users with slow modems 
and low bandwidths. The World Wide Web Consortium (W3C) has 
issued Web Content Accessibility Guidelines 1.0 (Chisholm, Vander- 
heiden, and Jacobs 1999). 

Before selecting free Web sites, one must consider the range of 
these standards and technologies and determine which are most crit- 
ical for prospective users. Two fundamental questions are whether 
sites function in generally available environments or whether special 
extensions are required. All users expect content to load quickly and 
access to be rapid and available at any time. The Internet Detective 
provides important introductory guidelines for evaluating accessibil- 
ity and for determining how it is affected by the standards and tech- 
nologies that designers employ. For example, does the resource use 
proprietary extensions to HTML that some browsers cannot recog- 
nize? Can the information be accessed if the user has keyboard-only 
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navigation capabilities? The presence of metadata, alternative text 
when images are switched off, text captioning for audio material, 
and links and downloading instructions to any software that is re- 
quired to use the resource are key to evaluating the extent to which 
currently accepted or recommended standards and technology have 
been used in developing a site. Simply stated, are the technologies 
used appropriate for the content and the intended audience? One 
basic question that should not be overlooked is whether a site offers 
users more than they might find in a print resource containing the 
same information. For further details on this subject, see the Internet 
Detective. 

3. 4.3.4. User Support. User support relates to content (e.g., scope state- 
ments and collection policies) and technological applications. A well- 
developed Web resource provides information that answers users' 
questions as they arise. Frequently, this information is provided 
through frequently asked questions (FAQs), which anticipate the 
most commonly requested or needed information about using the 
site. 

FAQs, however, provide only static support. In some cases, this 
is sufficient. For example, if a particular viewer or audio software is 
necessary or preferable to provide full access to content, a simple 
statement to that effect may prove adequate for the user to proceed. 
When, however, interactive support would be more beneficial to the 
user, help screens or contact information should be easy to locate and 
user-friendly. "Contact Us," "Help Desk," or similar links (e.g., e- 
mail, postal address, phone numbers, office hours, individuals' 
names) should be included in the design and be easily identifiable. 

3. 4.3.5. Terms and Conditions. Access to some free Web sites requires 
that users first agree to specific conditions. These may range from 
simple registration to user fees or agreements to download informa- 
tion only under specific conditions. A great deal of content delivered 
through the Web that supports higher education and research is ac- 
cessible without any restrictions. Where restrictions do exist, one 
must evaluate whether they are appropriate. If registration is re- 
quired, why is it necessary? How will user information be retained 
and used or shared with third parties? Registration information may 
be gathered to assist site developers in monitoring what and how 
users access content in order to improve content coverage and access 
over time. It may also be gathered for reasons that are not apparent 
or that go unstated. Where user fees are applicable, are they appro- 
priate? If fees are required to access a site, does the service point only 
to resources that are themselves free? If so, does added value at the 
site warrant the required cost to users? Terms and conditions of use 
per se may not be inappropriate, but their validity should be con- 
firmed before a site is added to a gateway or OPAC. 
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3A.3.6. Rights Legitimacy. Related to terms and conditions of use is the 
question of whether those responsible for the content and access to 
the site have the right to place restrictions on the content at that site. 
It is relatively easy to claim rights to information, but such claims 
may be unfounded and, therefore, not legitimate. Proving the legiti- 
macy of such claims is difficult and time-consuming. 

3.4.4. Process or Technical Criteria 

Among the most frequently cited characteristics of Web resources is 
volatility. Links within a site may abruptly cease to function; an en- 
tire site may disappear without warning. Free Web resources come 
with no guarantees that their content or accessibility will remain con- 
stant or improve over time. In extreme cases, an entire resource may 
cease to exist or change its URL without directing users how to ac- 
cess the same resource at a new URL. Process criteria are those tech- 
nical features that measure the integrity of a site and the availability 
of content reported to be provided by the site. They are the functions 
and features that permit the end user to access the content selected 
and described by the provider (i.e., author or creator). 

3.4.4. 2. Information Integrity. Information integrity, or maintenance, 
pertains to the intellectual content offered by a site and is measured 
by how successfully the value of the content remains current or im- 
proves over time. Determining the value of content at any site re- 
quires knowing when a site was created, at what frequency updates 
and revisions are scheduled, and the date of the last revision. With- 
out one or more of these "time stamps," one cannot know whether 
content is current. Criteria for evaluating these dates and intervals 
between revisions vary greatly, depending on the nature of the con- 
tent. Time-sensitive data require frequent verification and revision; 
other types of information are quite durable. Information that is not 
time-sensitive may be well served by static resources that are updat- 
ed only infrequently or not at all. An important indicator of informa- 
tion integrity is knowing whether the provider or creator of the site 
is likely to maintain the resource over time. For example, resources 
developed by students, especially those developed as part of a short- 
term project, or by staff supported by one-time project funding, may 
not be maintained over time. The shelf life of such resources is fre- 
quently short, and the reliability of their content decreases at an ac- 
celerating rate over time. (See also Content Currency, Section 3.4.2.7.) 

Site content that is revised periodically presents its own range of 
challenges. Whereas superseded information in print format remains 
accessible through the retention of earlier printings and editions, su- 
perseded Web content frequently ceases to exist. It is now, however, 
an increasingly accepted practice to archive superseded content 
when that content has potential durability or value over time. 

3.4.4.2. Site Integrity. Site integrity pertains to the stability a site ex- 
hibits over time and to how a site is administered and maintained. It 
is, therefore, the responsibility of the site manager or Web master. 
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Even static resources require oversight, lest access to the resource it- 
self ceases. Site integrity requires a commitment to ongoing mainte- 
nance. Broken links must be fixed promptly. Updates and revisions 
should be noted and reflected in revision numbers. Whether super- 
seded content is archived, the URL where that content is maintained, 
and how requests to access the archive should be made is critical in- 
formation that should be included in FAQs or through other forms of 
information pertaining to the site. Without adequate information on 
these issues, the integrity of a site will be poor or nonexistent. 

3.4.4.3. System Integrity. System integrity is a measure of technical 
performance. The term refers to the stability and accessibility of the 
server hosting the resource. Assuring the quality of system integrity 
is the responsibility of the system administrator, not the creator of 
the information or the site manager. Server stability is perhaps the 
single most important criterion in judging the quality of a resource. 
The quality of the server ultimately determines the value of a Web 
resource. A server that experiences more traffic than it can handle 
will not provide consistent ease of access and will reduce the value 
of the resource, regardless of the value of its content. Frequent down- 
time or poor response time renders a resource virtually inaccessible. 
Anticipated downtime should be announced in advance, and all 
downtime should be reported, along with a projected date and time 
when the resource will again be available. 

Important evaluators when judging system integrity are whether 
a site is mirrored and whether one can access it multiple times with- 
in a specified period (e.g., a certain number of times per day or 
week). Internet Scout Report staff, for example, check the availability 
of each site three times in the days prior to making it available. Staff 
responsible for selecting free Web resources at the Kuopio University 
Library in Finland require that links work sufficiently well on the 
third and final attempts to access a site over set intervals (Kuopio 
University Library Working Group 1996). 

RECOMMENDED EXAMPLES OF 
SELECTION CRITERIA 

Caywood, Carolyn. 1996. Selection Criteria for World Wide Web 
Resources. Public Libraries 60 (Summer): 169. 

DESIRE. Selection Criteria for Quality Controlled Information 
Gateways. Available at http://www.ukoln.ac.uk/metadata/desire/ 
quality/ toc.html. 

HealthWeb. Selection Methodology and Guidelines. Available at http:/ / 
healthweb.org/guidelines.cfm. 

infor-quality-1. Selection Criteria for Internet Information Resources: A 
Poll of Members of infor-quality-1. Available at http:/ / 
www.vuw.ac.nz/~agsmith/evaln/poll.htm. 
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Librarians' Index to the Internet. Selection Criteria for Adding 
Resources to the LII. Available at http://www.lii.org/search/file/ 
pubcriteria. 

McGeachin, Robert. 1998. Selection Criteria for Web-Based Resources 
in a Science and Technology Library Collection. Issues in Science and 
Technology Librarianship 18 (Spring). Available at http:/ / 
www.library.ucsb.edu/istl/98-spring/article2.html. 

MedWeb. Guidelines for Inclusion of Sites in MedWeb. Available at 
http://www.medweb.emory.edu/MedWeb/history.htm. 

Pratt, Gregory F. 1996. Guidelines for Internet Resources Selection. 
College & Research Libraries News 3 (March): 134-1 35. 

Scout Report. Selection Criteria. Available at http:/ / scout.cs.wisc.edu. 

WWW-vlib. Summary of Selection Criteria. Available at http:/ / 
lists.w3.org/Archives/Public/www-vlib/msg00276.html. 



A. ACCESS: RESOURCE DISCOVERY AND 
ADDED-VALUE FUNCTIONS 



4.1. Resource Discovery 

Many of the difficulties encountered by end users when searching 
the Web also confront library subject specialists and technology ex- 
perts in their efforts to select free Web sites. Identifying high-quality 
Web resources is labor-intensive. Properly carried out, it is the most 
challenging and potentially most costly aspect of building scalable 
and sustainable collections. Although machine harvesting appears to 
be promising, it remains in nascent stages of development and is 
available only in limited settings. Resource discovery, evaluation, 
and indexing (i.e., cataloging) are still primarily manual processes 
that require well-formed strategies and efficiencies. The most clearly 
delineated resource discovery sources and strategies identified in 
preparing this report are those used by the Social Science Informa- 
tion Gateway. They include the following: 

• joining discussion lists 

• subscribing to distribution lists and e-mail publications 

• monitoring and browsing sites 

• actively searching the Internet 

• subject catalogs 

• higher education sources 

• Internet search tools 

• sites and lists that announce new Internet resources 

• Web agents 

• searching non-Internet sources (e.g., scholarly journals, newslet- 
ters, Web reviews) 
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Resource discovery strategies and procedures outlined in the 
DESIRE Information Gateways Handbook (section 2-2) are also recom- 
mended. 

4.2. Added Value: Cataloging, Metadata, 

Search Functions 

Resource selection is not the only acquisitions function of libraries. 

To assure access, a library must provide the following range of value- 
added services: 

• content description (e.g., descriptive and subject cataloging) 

• resource organization (e.g., classification schema; indexing services) 

• collection maintenance (e.g., provision of access over time, preser- 
vation, archiving, deselection) 

These same responsibilities pertain to free Web resources. There 
is not yet full agreement on whether traditional cataloging practices 
(MARC, LCSH, MESH) and classification schema (Library of Con- 
gress or UDC) adequately describe digital formats or satisfactorily 
serve users. Some experts recommend the creation of MARC records 
stored in traditional OPACs, while others call for new methods of 
description and record storage. Regardless of the descriptive rules 
and type of catalog or database selected, there is a consensus that the 
minimum identification and retrieval data are as follows: 

• title /name of resource 

• location of resource (URL) 

• author or editor (i.e., creator(s) of resource and of its intellectual 
content) 

• publisher (i.e., organization making the resource accessible) 

• free-text description, including audience 

Other elements recommended for inclusion in the catalog record 
are those developed by the Dublin Core Metadata Initiative. They 
include the following: 

• subject 

• contributor 

• date (created, last modified, data gathered) 

• type (collection, database, guide/gateway, organization, service, 
home page, news service) 

• format 

• identifier 

• source 

• language 

• relation (e.g., is part of, has part of, is a version of, replaces, is ref- 
erenced by, is based on) 

• coverage (geographic and temporal) 

• rights 
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Unrelated to the specific retrieval data one should record and 
the format in which it should be recorded (e.g., MARC, Dublin 
Core), other cataloging issues need to be resolved. For example, if a 
resource points to other sites, one should determine whether each 
site requires its own unique record or whether a record for the "pri- 
mary site" or collection is appropriate. Similarly, at what level is cat- 
aloging content adequate? What level of granularity should the cata- 
loging record reflect? How these questions are answered will 
determine the quality of the collection. Other issues suggest that li- 
brarians may need to rethink traditional library cataloging practices, 
lest metadata cataloging backlogs equal or surpass the backlogs of 
uncataloged print resources stored in research libraries throughout 
the world. 



RECOMMENDED EXAMPLES OF 
VALUE-ADDED SERVICES 

Baruth, Barbara. 2000. Is Your Catalogue Big Enough to Handle the 
Web? American Libraries 31(August): 56-60. 

DESIRE Information Gateways Handbook. 2.4 Cataloguing. Available at 
http://www.desire.org/handbook/2-4.html. 

Dublin Core Metadata Initiative. Available at http:/ /dub lincore.org/. 

Humbul Humanities Hub. Describing and Cataloging Resources, 
version 1.0 (modified 20 Feb., 2001). Available at 
http://www.humbul.ac.uk/about/catalogue.html. 

MacCall, Steven L., Ana D. Cleveland, and Ian E. Gibson. 1999. 

Outline and Preliminary Evaluation of the Classical Digital Library 
Model. In Knowledge, Creation, Organization, and Use. Proceedings 
of the 62nd ASIS Annual Meeting, Washington, D.C., October 31- 
November 4, 1999: Medford, N .J.: Information Today. 

RENARDUS. Executive Summary. Available at 
http://www.renardus.org/deliverables/d6_l/doc0002.htm. 

ROADS Cataloguing Guidelines. Available at 
http://www.ukoln.ac.uk/metadata/roads/cataloguing/. 

Sowards, Steven W. 1998. A Typology for Ready Reference Web Sites in 
Libraries, firstmonday Peer-Reviewed Journal of the Internet 3(5). (See 
Elements in Typology of Ready Reference Web Site Designs, pp. 5-6.) 
Available at http://www.firstmonday.org/issues/issue3_5/ sowards/ 
index.html. 

University of Virginia Libraries. 1998. Ad Hoc Committee on Digital 
Access. Final Report. Approved June 15, 1998. 
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5. DATA MANAGEMENT: COLLECTION MAINTENANCE, 
MANAGEMENT, AND PRESERVATION 



Once created, collections of free Web resources require maintenance. 
Unlike print resources, free sites are highly dynamic. Content chang- 
es or is revised rapidly. In the context of higher education, supersed- 
ed content can be crucial. In the print environment, superseded con- 
tent can be easily retained or, if transferred to another location, 
relatively easily retrieved. The Web does not guarantee equivalent 
availability and access to superseded information. Unless archived 
and readily available through that archive, superseded information 
ceases to exist. Because of this ephemeral aspect of the Web, effective 
maintenance of collections of Web resources is labor-intensive and 
calls for a long-term staffing commitment. Collection maintenance 
can be among the more costly aspects in building and managing 
scalable and sustainable collections of Web resources. Archiving 
"snapshots" of Web resources as they exist at any given moment re- 
quires staff time and server space. Providing a mirror site for infor- 
mation developed and maintained by a third party is also costly, but 
not necessarily prohibitive, if the need to preserve or mirror is ac- 
knowledged to be within the scope of the collection. The National 
Library of Australia Pandora Project recognizes its obligation to pro- 
vide indefinite access to the Web sites it selects for the project, and it 
archives these sites at the time they are cataloged. The largest and 
best-known archiving project is Brewster Kahle's Internet Archive, 
which employs Web-crawling robot software to collect Web pages 
from publicly accessible Web servers and examines links on these 
pages to locate, evaluate, and archive yet additional pages. 

Maintenance tasks include the following: 

• Link checking. Among the most persistent problems associated 
with collections of free Web resources are dead links. Various soft- 
ware programs are available to monitor links; these programs can 
be programmed to run at predetermined intervals. Recommended 
intervals range from once a week to once every three months. 
Longer intervals have proved counterproductive. 

• Reviewing error codes. Perhaps the most frequently encountered 
error code is "403 Page not found." Resource locators frequently 
change, but the old URL may not point to the new location. Soft- 
ware will report these changes; staff members need to update all 
appropriate links. 

• Reviewing content. Because content frequently changes, staff 
should regularly confirm that it remains consistent with descrip- 
tions found in cataloging records and that it continues to conform 
to the collection scope and policy. 

• Revising cataloging records. Link checking and content review will 
determine whether and to what extent cataloging records should 
be revised. 
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• Deselection. Cataloging records and links should be deleted when 
a site can no longer be found or when its content no longer con- 
forms to the collection scope and policy. Pointing to content that 
cannot be located discourages users from using the collection. 
Similarly, when the quality of a site has deteriorated or changed to 
the extent that it no longer meets users' needs or scope criteria, the 
site should be deselected by authorized staff. 

RECOMMENDED EXAMPLES OF 
MAINTENANCE GUIDELINES 

DESIRE Information Gateways Handbook. 2.6. Collection Management. 
Available at http://www.desire.org/handbook/2-6.html. 

Humbul Humanities Hub.4. Collection Management. Available at 
http://www.humbul.ac.uk/about/colldev4.html. 

National Library of Australia. Pandora Project. Available at 
http://pandora.nla.gov.au/. 

Nicholson, Dennis, and Alan Dawson. BUBL Information Service: 8.5 
Link Checking and Record Maintenance. In Wells, Amy Tracy, et al. 

1999. The Amazing Internet Challenge: How Leading Projects Use Library 
Skills to Organize the Web. Chicago: American Library Association. 



6. MULTILINGUALITY 



The Internet knows no national or ideological boundaries. It permits 
users to access information, regardless of the country in which the 
server hosting the resource is located. Free Web resources are created 
and maintained in all the languages of the world. American research 
libraries collect foreign-language sites that support teaching and re- 
search in language and literature programs or certain subdisciplines 
in history, art, music, medieval studies, and political science. Howev- 
er, repeated informal surveys of free Web resources offered by lead- 
ing American university libraries reveal that they neither reflect the 
breadth of non-English-language resources accessible via the Internet 
nor begin to approach the extent to which publications in languages 
other than English are represented in and continue to be acquired for 
their print collections. This is primarily because English is used so 
extensively in the Internet and because it is increasingly the language 
of choice for international communication. These two facts combine 
to encourage a regrettably large number of academics, including li- 
brarians and technical specialists, to underestimate the extent to 
which free Web resources with foreign-language content are needed 
to support higher education and research. 

Previously, these sites presented major access problems for users, 
because software needed to display non-roman alphabets and char- 
acter sets was not widely available. Inexpensive software programs 
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have now largely overcome this problem, but other challenges to in- 
corporating non-English sites remain. 

The British Resource Discovery Network (RDN) has addressed 
the issue of collecting non-English sites and recommends that inclu- 
sion should be based on appropriateness to the larger topic, scalabili- 
ty, and user demand. The RDN recommends 

• a predefined number of languages that are significant or appropri- 
ate for the subject: importance of languages other than English for 
a specific subject (e.g., Danish for sites pertaining to Kierkegaard 
or Italian for sites pertaining to opera) 

• value to user 

• scalability: strategic language-by-language expansion of a site 

The ease of displaying non-roman alphabets and character sets 
notwithstanding, including foreign-language content presents a 
range of additional challenges, including the following: 

• Data presentation. What software and standards are required to 
display, search, and retrieve foreign languages using non-roman 
fonts? Should non-roman fonts be romanized? 

• Metadata and cataloging rules. Should titles and the names of cor- 
porate bodies be translated into English? Should only English-lan- 
guage descriptive cataloging and keywords be used? Descriptions 
in two or more languages may enhance access but significantly 
increase workload. 

• Searching and browsing. Should one expect to search by language 
or by domain name to narrow search results? 

Many legitimately call for greater overarching foreign-language 
search capabilities. Research and development projects are currently 
under way to provide greater access to Web content without linguis- 
tic barriers through systems using cross-language information re- 
trieval. The goal of these systems is to create search capabilities that 
permit the retrieval of sites to be independent of the natural lan- 
guage used to state the query. The success of such systems will de- 
pend on the broad application of emerging Web standards. The myr- 
iad issues and challenges pertaining to multilinguality lie outside the 
scope of this paper and will not be addressed here. Readers interest- 
ed in these matters are referred to the following: 

Oard, Doug W. 2000. Cross-Language Information Retrieval Resourc- 
es (Overview) [last modified Nov. 24]. Available at http:/ / 
www.ee.umd.edu /medlab/ mlir / . 

Peters, Carol, and Costantino Thanos. 2000. DELOS: A Network of 
Excellence for Digital Libraries; Promoting and Sustaining Digital 
Library Research and Applications in Europe. Cultivate Interactive 1 
(July). Available at http://www.cultivate-int.org/issuel/delos/. 
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Koch, Traugott. 2000. Cross-Browsing in Renardus: Usage of Vocabu- 
laries in Renardus Gateways. Available at http://www.lub.lu.se/ 
renardus/ class.html. 

RECOMMENDED EXAMPLES OF 
MULTILINGUALITY PRACTICES 

Digital Asia Library (DAL). About DAL. Available at http:/ / 
digitalasia.library.wisc.edu/about.html. 

DESIRE Information Gateways Handbook. 2.12. Multilingual Issues. 
Available at http://www.desire.org/handbook/2-12.html. 

Jennings, Simon. 2000. RDN Collections Development Framework, 
Version 1.1 (May). Available at http://www.rdn.ac.uk/publications/ 
policy.html. 



7. USER SUPPORT 



Without publicity and promotion, a collection of Web sites can be an 
underutilized, even an unused, resource. A formal plan to inform po- 
tential users is essential. Publicity is best accomplished when collec- 
tion creators identify their user groups and develop publicity and 
training materials best suited for those users. 

Publicity can range from print media to electronic media and 
may include face-to-face presentations. In the digital environment, it 
may seem inappropriate to rely on print formats to promote Web- 
based resources, but flyers, posters, newsletters, articles and reviews 
in professional journals, and press releases remain the primary 
modes of advertising commodities and services. Print-based publici- 
ty is highly effective when directed to specific user communities, but 
traditional print formats have certain costs associated with produc- 
tion and distribution (e.g., paper and printing costs, distribution and 
advertising fees). Using e-mail for publicity purposes avoids these 
expenses, but staff must still be paid to prepare publicity. A good ex- 
ample of effective electronic publicity is the regular updates the In- 
ternet Scout Report sends to its list subscribers. 

Face-to-face presentations, workshops, conference papers, and 
poster sessions can be highly successful, but the costs associated 
with such presentations (staff time, support to prepare presentation 
materials, conference registration fees, travel and lodging) mount 
rapidly. 



RECOMMENDED EXAMPLES 
OF USER SUPPORT 

DESIRE Information Gateways Handbook. Publicity and Promotion 2.8. 
Available at http://www.desire.org/handbook/2-8.html. 
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Internet Scout Report. Available at http://scout.cs.wisc.edu/scout/ 
report. 

SOSIG. Social Science Information Gateway. Training Materials and 
Support. Available at http:/ /www.sosig.ac.uk/. 

UKOLN. The UK Office for Library and Information Networking. 
Publicising Your Project. Available at http:/ / ukoln.bath.ac.uk/ 
services/ elib / info-projects/ publicise.html. 



8. HUMAN RESOURCES: ORGANIZATIONAL AND 
FINANCIAL ISSUES 



Collections of Web resources can be built by one person working in 
relative isolation or by large collaboratives working together on-site 
or at various locations. Staff may volunteer their expertise and ser- 
vices or they may be paid. There is no preferred model; each presents 
a range of options, advantages, and disadvantages depending on the 
scope and goals of the collection. No staffing model, however, can be 
successful if it does not recognize that building and maintaining the 
collection generate specific costs and present wide-ranging issues for 
communication and workflow across organizational units. In the 
case of libraries, a subject specialist working alone may have an im- 
pact on the workflow and priorities of the cataloging, reference/in- 
structional, or technology staff. These costs and workflow issues 
must be recognized and addressed. The following section outlines 
staff skills and experience, training, individuals versus collabora- 
tives, and costs associated with staffing and managing collections. 



8.1. Staff Skills and Experience 

8.1.1. Cataloging 

Staff responsible for cataloging free Web sites benefit from broad 
training and experience in print-format description and subject cata- 
logs. The Internet Scout Project in January 2001 posted a vacancy for 
a cataloger with the following skills and experience: 

• Master of Library Science degree or corresponding experience 

• educational /professional experience in electronic and networked 
information storage 

• [Web-based] searching and retrieval 

• knowledge of 

• AACR2 

• USMARC format 

• emerging standards, such as Dublin Core 

• Library of Congress subject headings 



O 
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8.1.2. Selection 

Skilled human involvement in the selection process is one of the 
most consistently called-for components of all projects studied in 
preparing this report. Harvester software may meet to a limited de- 
gree specific predetermined criteria for selecting resources, but only 
experienced subject experts (i.e., bibliographers, content experts, 
scholars) possess the level of knowledge required to select high-qual- 
ity resources. Nonetheless, free third-party Web resources exhibit 
sufficiently different traits and characteristics from print and analog 
resources in terms of origin, content, authorship, access, and storage 
(archiving) that even the most skilled and experienced bibliographer 
of print publications would have difficulty in applying time- tested 
principles and guidelines for evaluating print and analog formats to 
scholarly resources in digital format. Other guidelines are needed. 
Some might argue that the selector trained in traditional collection- 
development practices is not the most appropriate person to identify, 
evaluate, and select free third-party Web resources. Some might even 
argue that the traditional bibliographer or selector is unprepared for 
the task at hand and that selection of Web resources more appropri- 
ately belongs in the realm of reference librarians, library technology 
staff, or other subject experts (e.g., advanced graduate students or 
faculty members). Such experience or training assures that those se- 
lecting for the collection understand user needs and expectations, 
and that they can base selection on a knowledge of the relevance and 
value of resources to the target audience. Subject experts are superior 
to harvest software because they can evaluate content critically and 
in a manner that harvesters have yet to master. Subject specialists 
should also be prepared to provide end-user training. Staff responsi- 
ble for developing the intellectual scope and quality of collections 
should have experience developing analog collections or formal aca- 
demic training in pertinent subject areas or both. 

8.1.3. Technical Support 

Central to successful Web resource collections is staff with excellent 
technical skills, regardless of the size of the collection. The role of 
staff is fundamental to the organization, access, and ongoing mainte- 
nance of the collection. Typical responsibilities of technical staff in- 
clude the following: 

• technical understanding of networked environment 

• programming and scripting skills 

• infrastructure software evaluation, selection, and maintenance 

• interface development 

• archival storage 

• mirror site support (where appropriate) 

8.1.4. Project Manager 

The number and level of staff depend on the scope of the project. 
Large projects benefit from managers who can provide broad over- 
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sight and coordination. Persons with project management responsi- 
bilities should possess both subject and technical knowledge. 

RECOMMENDED STAFFING SKILLS 

DESIRE Information Gateways Handbook . 1.3. Staff and Skills Required 
Overview. Available at http://www.desire.org/handbook/l-3.html. 

DESIRE Information Gateways Handbook. 2.1. Quality Selection. 

Available at http://www.desire.org/handbook/2-l.html. 

Jennings, Simon. 2000. RDN Collections Development Framework. 
Version 1.1 (May). Available at http://www.rdn.ac.uk/publications/ 
policy.html. 

8.1.5. Advisory Boards 

It is commonly accepted practice to appoint advisory boards to large 
projects and to those of extended duration. Board members should 
include subject specialists and technical experts. Their role is to 
shape the overall goals and objectives of the collection, to confirm 
that the project remains on course over time, and to address emerg- 
ing issues. 

RECOMMENDED ADVISORY BOARD MODELS 

BIOME Special Advisory Group on Evaluation. Available at 
http: / /biome.ac.uk/sage/ . 

Edinburgh Engineering Virtual Library (EEVL). Annual Report to the 
eLib for the Period from 1st August 1995 to 31st July 1996. 1.2 Project 
Infrastructure. Available at http://www.eevl.ac.uk/ document.html. 



8.2. Staff Training 

The nature of the Web and the characteristics of free Web resources 
challenge traditional collection-development and -management prac- 
tices. This reality requires that staff receive training and supervision. 
The DESIRE Handbook recommends developing the following: 

• exercises and examples for evaluating Web sites 

• online tutorials 

• staff manuals 

• process to review sites selected by staff 

• group e-mail lists to discuss and debate quality issues 

• editorial meetings 

Many quality sites are added to collections through user sugges- 
tions by means of "Contact us" or "Add new resource" buttons. 
Training users of such sites to become informed selectors is neither 
appropriate nor feasible; however, some means of quality control 
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should be maintained. The Humbul Humanities Hub has a policy 
that requires contributions from users whose credentials and selec- 
tion criteria are unknown or cannot be judged to be reviewed, evalu- 
ated, and cataloged by staff. 

RECOMMENDED STAFF TRAINING PRACTICES 

DESIRE Information Gateways Handbook. 1.3. Staff and Skills Required 
Overview. Available at http://www.desire.org/handbook/l-3.html. 

DESIRE Information Gateways Handbook . 2.1. Quality Selection: Training 
Staff. Available at http:/ / www.desire.org/handbook/2-l.html. 

Humbul Humanities Hub.3. Collection Management. Available at 
http: / / www.humbul.ac.uk/about/colldev3.html. 

Jennings, Simon. 2000. RDN Collections Development Framework. 
Version 1.1 (May). Available at http://www.rdn.ac.uk/publications/ 
policy.html. 



8.3. Financial Issues 

This report concerns building sustainable collections of free third- 
party Web resources — resources to which anyone can have full ac- 
cess without compensating the creator or host site. "End-user access 
without compensation" is the extent to which these resources are 
free. All value-added services that libraries provide to ensure im- 
proved access have significant costs. Value-added services for analog 
collections (e.g., selecting, describing, organizing, and storing) have 
specific costs associated with them. These costs are well-known to 
administrators and rather well documented in professional litera- 
ture. Value-added services for free Web resources have similar costs; 
however, few people are aware of those costs, and comparative cost 
data, across collections or institutions, are not readily available. 

8 . 3 . 1 . Staffing 

A good source for cost data is grant proposal budgets. One three- 
year project with a staff of 5.5 full-time equivalent (FTEs) estimated 
that total personnel costs would be $714,633 over the life of the 
project. The principal investigator's home institution agreed to cover 
the cost of equipment and software, which totaled $22,300. Exclud- 
ing overhead, the total budget for this three-year project was 
$736,933. The principal investigator proposed developing a collec- 
tion of "up to 10,000 sites," possibly fewer. The average cost per site 
in this project would be $73.69 if the project met its goal of 10,000 
sites; the cost would be higher if fewer were selected. One might 
question whether the costs associated with this project reflect the av- 
erage cost associated with building similar subject gateways. At the 
very least, this case demonstrates that there are identifiable costs as- 
sociated with building collections of free Web sites. Except for the 
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absence of a purchase price, the nature of these costs closely resem- 
bles costs associated with acquiring, cataloging, and maintaining an- 
alog collections: staff costs for selection and description; technology 
costs for storage and retrieval. 

8.3.2. Sustainability and Related Costs 

Building sustainable programs of any kind requires moving from 
specific projects, which by their nature lie outside the scope of long- 
term institutional goals, to programs that are integral to the institu- 
tion's goals and mission. How one elects to build sustainable collec- 
tions of free, third-party Web resources has a direct effect on human 
resources, organizational models, and budgets. Staff members re- 
sponsible for selecting and cataloging analog materials have full- 
time jobs. Increasing their responsibilities to include developing col- 
lections of free Web resources calls into question preexisting 
priorities (i.e., developing analog collections and other responsibili- 
ties). Creating opportunities for selectors to select free sites directly 
impinges on processing workflows. Which is a higher priority: pro- 
cessing new books that are not free, or cataloging free Web sites? If 
processing units receive additional staff to handle increased work- 
loads stemming from the need to catalog free sites, have subject spe- 
cialists received commensurate time to select and evaluate them? 
Further, what plans and provisions have been made to allow the req- 
uisite technical support of selectors' and catalogers' efforts? These 
questions underscore that new priorities in one area have a direct 
impact on workloads and priorities in other units. 

Selecting free sites, whether a small number for inclusion in the 
OPAC or an entire collection to be maintained as a subject gateway, 
requires planning. Planning, in turn, requires that library managers 
understand and acknowledge that Web site selection is a new li- 
brary-delivered service, or range of services, with specific and 
unique needs and with intrinsic and far-reaching implications. Work- 
flows associated with analog collections run fairly independently of 
one another. After selection, the order is given to the acquisitions 
staff, who place the order, receive the item, and process the invoice 
before forwarding the item to the cataloging unit. Catalogers for- 
ward the item to staff, who apply call numbers, and other staff place 
the item on the shelf. There is rarely a need for cross-functional com- 
munication in the analog environment. 

This is not the case with free Web sites. Each decision in the se- 
lection, cataloging, storage, and retrieval-interface process impinges 
on the process as a whole. Selecting free sites is a new responsibility. 
Thus, if existing staff begin selecting free sites, who will take on the 
work they previously handled? How will catalogers handle new for- 
mats for describing these resources? Can existing cataloging staff as- 
sume this responsibility without additional training, and who will 
continue the cataloging of analog materials? Can technical staff effec- 
tively provide access to these new resources? Do they have and un- 
derstand how to use Unicode-compliant software? Can they cope 
with the new challenges of working with records based on Dublin 
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Core? These are substantive questions that libraries are struggling to 
answer. Understanding that there is no single right answer, and rec- 
ognizing that every policy decision can have a direct and significant 
impact on work units that in the analog context coexisted without 
having an impact on each other's procedures or priorities will facili- 
tate scalability. 

Beginning with staffing choices and continuing through selection 
policy, cataloging practices, and interface design choices, building 
collections of free Web sets presents staff and administrators with a 
series of related issues that early in the planning process reveal 
themselves only superficially. For this reason, these issues require 
greater evaluation and consideration by all participants in the collec- 
tion-building process and in looking at the process in its totality. 
Building collections cannot succeed if the process is viewed as a se- 
ries of steps that coexist but do not influence or impede each other. 
Building collections of free Web resources must be viewed as a con- 
tinuum — as a series of interdependent steps. Each component part 
has potential and probable influence and impact on one or more of 
the other parts. The scope of the collection influences selection, 
which, in turn, influences cataloging decisions. Technical limitations 
may determine the collection scope, cataloging practices, or other 
aspects of the collection. Understanding the range of issues and alter- 
natives the collection will require and how they will affect each other 
will encourage the creation of multifunctional or cross-functional 
units that facilitate communication among those who must learn 
new skills (e.g., metadata formats) in order to provide new services 
(e.g., subject gateways). Undoubtedly, the staffing models created for 
project-based development of free Web sites will influence, if not de- 
termine, staffing needs and patterns for developing and maintaining 
sustainable collections of free Web resources. 

8.3.3. Staffing Models: The Individual versus the Collaboratory 

8.3.3.1. Individual Initiatives. Many outstanding collections have been 
built through the efforts of one person alone. Subject experts and col- 
lection curators are well positioned to identify resources relevant to 
their respective fields of expertise. The inefficiencies of this approach 
are numerous, but not necessarily so great as to rule out this ap- 
proach in all contexts. Individuals can make important contributions 
if their collections are narrowly focused or specialized. 

8.3. 3. 2. Departmental Initiatives. A library, a unit within a library, or a 
unit outside a library (e.g., an academic department) can quite effec- 
tively build collections. Subject pages and guides permeate home 
pages for libraries and academic departments. Even a cursory review 
of these sites will reveal a high degree of redundancy — 75 percent or 
higher. This duplication of effort may not be "bad" or "wrong," but 
it should call for close consideration and evaluation. Departmental 
initiatives serve many purposes and may be highly successful within 
their specific context. Their greatest weakness may be that they re- 
flect the traditional "pride of place" and institutional reputation that 
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have driven the building of print and analog collections — a reality 
created and encouraged by the nature of physical collections. Web- 
based resources do not have the fiscal barriers to access that charac- 
terize print-based materials. Why then build redundant collections 
that are unique only in their brand or URL? 

8. 3.3. 3. Managed Collaboration. A review of stated and implied practic- 
es used in facilitating access to free Web sites suggests that the 
growth of these sites is too great to permit a single individual or in- 
stitution to adequately identify and build collections in a timely 
manner. The Web is simply too vast. OCLC staff in 1999 described 
the number of Web sites as doubling annually; at the same time, half 
of all Web sites disappear each month. In other words, approximate- 
ly 55 percent of all Web sites available on any given day did not exist 
one month earlier. Such statistics demonstrate the volatile and dy- 
namic nature of the Web. If this growth and volatility continue, li- 
brarians will be well advised to emulate the collaborative Web har- 
vesting projects of their colleagues throughout Western Europe and 
in Australia and New Zealand, where projects such as Resource Dis- 
covery Network (RDN), Social Science Information Gateway (SOS- 
IG), Humbul Humanities Group, Finnish Virtual Library (FVL), EU- 
LER, and Pandora have advanced rapidly. Because these projects rely 
on collaboration among staff at multiple institutions and/ or among 
special project staff, they have accomplished what no individual or 
single institution working in isolation can achieve: rapid and effi- 
cient collection development of nonredundant collections at a rea- 
sonable cost. In North America, the Internet Scout Report and the 
Digital Asia Library are two examples of specially funded projects 
staffed with full-time teams of subject specialists, technical experts, 
and metadata catalogers. These projects further illustrate that suc- 
cessful harvesting of high-quality Web sites is neither a part-time job 
nor an added responsibility for staff who are primarily accountable 
for other duties. In addition, discussions under way within ARL and 
various consortia underscore that successful mining of Internet re- 
sources will require libraries to provide users with vertical (i.e., 
deep) searching of Web content, not merely the horizontal (i.e., su- 
perficial) searching of sites typically provided by popular Web 
browsers (Campbell 2000). 

8. 3. 3 A. Facilitated Collaboration. Facilitated collaboration is not based 
so much on shared principles, values, or aims as on the use of some 
high-level common framework for software such as DBOZ.org or the 
Cooperative Online Resource Catalog (CORC). The latter, organized 
by Netscape and others, is a collection of site reviews to which users 
may contribute. CORC is a metadata creation system for bibliograph- 
ic records and pathfinders that describes electronic resources and has 
contributors from around the world. Both cases afford major bene- 
fits: large numbers of individuals coordinated by their home institu- 
tions contributing large numbers of sites, resulting in a rapid rate of 
collection development. Shortcomings include the lack of a single. 
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overarching set of selection criteria, limited assurance that resource 
descriptions reflect current content, and uneven subject coverage. 

The highest level of collaboration is one in which participants 
recognize that decisions about metadata and controlled vocabularies 
need to be made, and that these decisions influence and determine 
collection scope, access, purpose (i.e., popular or scholarly), human 
resources, and cost. Who makes decisions and how decisions are im- 
plemented is fundamental to all forms and levels of collaboration. 



9. FUTURE DIRECTIONS: NURTURING SUSTAINABILITY 



Why collect free Web resources? The obvious answer is that current 
users need facilitated, value-added access to these resources to en- 
sure that they will retrieve sites with high-quality content. The pri- 
mary question for the future is whether broad application of en- 
hanced metadata standards and next-generation search engines will 
allow end users to mine the Web themselves with greater precision 
than is currently possible and, in so doing, bypass the current need 
for facilitated access. In other words, will there be ongoing need for 
subject specialists (content experts) to provide the services tradition- 
ally provided by bibliographers and libraries? 

For the foreseeable future, it is safe to say that the higher educa- 
tion community will remain dependent on collections of high-quality 
resources selected and described by experts using the practices out- 
lined in this report. Near-term prognostications do not call for the 
subject expertise of humans to be replaced by computer-based search 
capabilities. Instead, the higher education community will grow in- 
creasingly dependent on free Web content made available through 
expanded human efforts to winnow, sift, and deliver access to a larg- 
er percentage of the Web's high-quality resources. Among the near- 
term future developments will be the following: 

• increased outreach to user groups 

• increased reliance on collaborative collection development 

• greater emphasis on underrepresented subjects and non-text- 
based formats 

• development of instructional support through course-specific col- 
lections or browsing by course number 

• in-depth mining of distributed databases as foreseen in the Asso- 
ciation of Research Libraries' scholars portal model 

• simultaneous searching of analog and Web-based resources 
through the integration of distributed catalogs of Internet resourc- 
es and library OPACs 

• increased acceptance of internationally recognized cataloging 
standards 

• increased control of URLs and descriptive metadata to reduce or 
eliminate broken links 
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• broad use of harvesting software to collect embedded metadata 
and thus facilitate the rapid cataloging of sites and eliminate re- 
dundant efforts 

• decrease in manual harvesting and cataloging 

Until now, building collections of free Web resources has been 
modeled on time-honored practices for building print and analog 
collections: an informed winnowing-and-sifting process that entails 
application of predetermined criteria and the exercise of human 
judgment. The Web is far too vast, its resources far too rich, for these 
same practices to prove successful over time. The size of the Web al- 
ready exceeds human ability to review, organize, and manage collec- 
tions at the level required to sustain the needs and priorities of high- 
er education. The dynamic nature of the Web, particularly of its free 
resources, will render unviable manual review and cataloging. Re- 
search libraries are approaching an environment in which selection 
and cataloging of free Web resources will be machine-driven. Hu- 
mans will develop selection criteria, but machines will apply them 
and accept or reject resources at a speed that only computers can de- 
liver. Descriptive and subject analysis will be drawn from metadata 
embedded within the resources themselves by their creators. 

Successful machine harvesting and cataloging techniques have 
yet to be perfected. The automatic methods that are currently under 
development, however, appear promising. The Library of Congress's 
Minerva project and the Swedish Royal Library Kulturarw3 project 
are examples of how sustainability of free Web resource collections 
will be achieved. How rapidly the process will be automated re- 
mains unclear. How easily automated procedures will be widely em- 
ployed remains uncertain. Until technology can facilitate the harvest- 
ing and cataloging processes, manual practices will continue to be 
used and will be the foundation upon which a successful automated 
process is built. 

Discussions of these topics and examples of current research 
projects in these areas are available at the following works and sites: 

Arms, William Y. 2001. A Report to the Library of Congress: Web 
Preservation Project, Interim Report. Cornell University. Available at 
http://www.cs.comell.edu. /wya/LC-web/. 

Campbell, Jerry. 2000. The Case for Creating a Scholars Portal to the 
Web: A White Paper. ARL Newsletter 211. Available at http: / / 
www.arl.org/ newsltr/ 211 / portal.html. 

Dublin Core. Available at http:/ /dublincore.org/. 

Platform for Internet Content Selection (PICS). Available at http: / / 
www.w3c.org/PICS. 

Resource Description Framework (RDF). Available at http:/ / 
www.w3org/TR/REC-rdf-syntax. 
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Resource Organization And Discovery in Subject-based services 
(ROADS). ROADS Harvester software development. Available at 
http: / / www.ukoln.ac.uk/ metadata /software-tools/. 

Royal Library of Sweden. Kulturarw 3 Heritage Project. Available at 
http://kulturarw3.kb.se/html/kulturarw3.eng.html. 
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